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High-throughput sequencing reveals extraordinary 
fluidity of miRNA, piRNA, and siRNA pathways 
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The nematode Caenorhabditis elegans contains each of the broad classes of eukaryotic small RNAs, including microRNAs 
(miRNAs), endogenous small-interfering RNAs (endo-siRNAs), and piwi-interacting RNAs [piRNAs). To better un- 
derstand the evolution of these regulatory RNAs, we deep-sequenced small RNAs from C. elegans and three closely related 
nematodes: C. briggsae, C. remanei, and C. brenneri. The results reveal a fluid landscape of small RNA pathways with essentially 
no conservation of individual sequences aside from a subset of miRNAs. We identified 54 miRNA families that are 
conserved in each of the four species, as well as numerous miRNAs that are species-specific or shared between only two or 
three species. Despite a lack of conservation of individual piRNAs and siRNAs, many of the features of each pathway are 
conserved between the different species. We show that the genomic distribution of 26G siRNAs and the tendency for 
piRNAs to cluster is conserved between C. briggsae and C. elegans. We also show that, in each species, 26G siRNAs trigger 
stage-specific secondary siRNA formation. piRNAs in each species also trigger secondary siRNA formation from targets 
containing up to three mismatches. Finally, we show that the production of male- and female-specific piRNAs is conserved 
in all four species, suggesting distinct roles for piRNAs in male and female germlines. 

[Supplemental material is available for this article.] 



Small noncoding RNAs, including microRNAs (miRNAs), piwi- 
interacting RNAs (piRNAs), and endogenous small-interfering 
RNAs (endo-siRNAs), regulate developmental and defense path- 
ways in animals. Each class of small RNAs has unique roles and 
genetic requirements but invariably binds to Argonaute proteins 
to form effector complexes that target nucleic acids containing 
partial or complete complementarity to the small RNA guide. 
Caenorhabditis elegans contains 25 Argonautes belonging to three 
different clades and multiple distinct subfamilies (Yigit et al. 
2006). 

miRNAs are —22 nt and repress gene expression through 
mRNA decay and translational repression (Bartel 2004). miRNAs 
are typically processed from long primary transcripts through 
the sequential activities of the ribonucleases Drosha and Dicer. 
Mature miRNAs associate with the AGO clade Argonautes ALG-1 
and ALG-2 in C. elegans (Grishok et al. 2001). Many miRNAs 
are conserved between C. elegans and humans, including miR-1 
and let- 7, which share 100% sequence identity across species 
(Pasquinelli et al. 2000; Lee and Ambros 2001). 

C. elegans piRNAs (also called 21U-RNAs) are 21 nt long and 
contain a uracil at their 5' termini (Ruby et al. 2006; Batista et al. 
2008; Das et al. 2008). piRNAs bind to the two PlWI-clade Argo- 
nautes PRG-1 and PRG-2. Mutations in prg-1 result in defects in 
the production and/or functionality of germ cells and reduced 
fertility at elevated temperatures (Batista et al. 2008; Wang and 
Reinke 2008). piRNAs are thought to interact through imperfect 
complementarity with target mRNAs in the germline to trigger 
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secondary siRNA production (Bagijn et al. 2012; Lee et al. 2012; 
Shirayama et al. 2012) 

The majority of C. elegans siRNAs are either 22 or 26 nt long 
and start with a 5 ' guanine and are thus referred to as 22G-RNAs 
(22G siRNAs) or 26G-RNAs (26G siRNAs), respectively. The 22G 
siRNAs are produced by the RNA-dependent RNA polymerases 
(RdRPs) RRF-1 and EGO-1 and bind to the WAGO clade Argonautes 
WAGO-1-12 (WAGO class siRNAs) to silence certain protein-coding 
genes, transposons, pseudogenes, and cryptic loci (Gu et al. 
2009) . A subset of 22G siRNAs produced by EGO- 1 associate with 
the Argonaute CSR-1 (CSR-1 class siRNAs) to guide chromosome 
segregation (Claycomb et al. 2009; van Wolfswinkel et al. 2009). 
It was proposed that CSR-1 class siRNAs may also provide a 
memory of self to protect endogenous genes from being routed 
into the piRNA and WAGO class siRNA pathways (Lee et al. 2012; 
Shirayama et al. 2012). 

The 26G siRNAs fall into two classes: a spermatogenesis- 
enriched class which associates with the AGO clade Argonautes, 
ALG-3 and ALG-4 (Han et al. 2009; Conine et al. 2010); and an 
oocyte- and embryo-enriched class which associates with the di- 
vergent PlWI-clade Argonaute, ERGO-1 (Han et al. 2009; Vasale 
et al. 2010; Fischer et al. 2011). Both classes of 26G siRNAs are 
produced by the RdRP RRF-3 and are thought to trigger secondary 
22G siRNA production. However, the majority of 22G siRNAs are 
produced independently of a 26G siRNA trigger. 

Many miRNAs are highly conserved in Caenorhabditis spe- 
cies (de Wit et al. 2009). In contrast, individual piRNA sequences 
are entirely nonconserved between C. elegans and C. briggsae 
(Ruby et al. 2006; de Wit et al. 2009) . It is unclear if other features 
of the piRNA pathway are conserved among nematode species. It 
is also unknown whether or not the various classes of endoge- 
nous siRNAs are conserved in Caenorhabditis species. Here, we 
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sequenced small RNAs from C. elegans, C. briggsae, C. remanei, and 
C. brenneri, and using a comparative genomics approach, we 
examined the conservation and evolution of each of the small 
RNA pathways. We identified hundreds of new miRNAs in 
C. briggsae, C. remanei, and C. brenneri, as well as thousands of 
piRNAs and each of the classes of endogenous siRNAs present in 
C. elegans. Although there is essentially no conservation of in- 
dividual small RNAs among any of the four species aside from 
miRNAs and a very limited number of CSR-1 class siRNAs, many 
features such as their genomic distribution, expression patterns, 
and tendency for 26G siRNAs and piRNAs to trigger secondary 
22G siRNA production are conserved. We also identified a new 
subfamily of RdRPs absent in C. elegans but conserved in each of 
the other three species. Finally, we discovered that production of 
distinct sex-specific populations of piRNAs is conserved across 



nematode species, suggesting different roles for piRNAs in male 
and female germlines. 

Results 

Expansion and diversification of Argonautes and RNA- 
dependent RNA polymerases 

To examine small RNA evolution in nematodes, we subjected small 
RNAs from C. elegans and three other nematode species, C. briggsae, 
C. remanei, and C. brenneri, to high-throughput sequencing (Fig. 
1A; Supplemental Table SI). Each of the four species is morpho- 
logically similar; however, their genomic sequences are highly di- 
vergent, with common ancestry —110 million generations ago 
(Fig. IB; Cutter et al. 2009). We constructed 18- to 28-nt small RNA 
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Figure 1 . Identification of small RNAs and associated proteins in Caenorhabditis species. (A) Bar graphs show the size and 5' first nucleotide distributions 
of small RNAs. (Insets) Pie charts display the proportion of reads belonging to different classes of small RNAs: miRNAs, piRNAs, and siRNAs, or unclassified. 
(B) Phylogenetic relationship of the elegans group of Caenorhabditis, adapted from Kiontke et al. (201 1 ). C. briggsae, C. remanei, C. brenneri, and C. elegans 
(highlighted in red), having had their genomes sequenced and partially annotated, were chosen for this study. (C) Phylogeny of RNA-dependent RNA 
polymerases. Genes from different species are color-coded, and different subfamilies are indicated by colored branches. (Cel) C. elegans, (Cbf) C. briggsae, 
(Cre) C. remanei, (Cbn) C. brenneri. (D) RT-PCR of rrf-4 from mixed-stage C briggsae, C remanei, and C brenneri. 
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libraries from synchronized populations of young gravid adult 
hermaphrodites for the androdioecious (male/hermaphrodite) 
species C. elegans and C. briggsae and from mixed populations of 
adult males and females for the gonochoristic (male/female) spe- 
cies C. remanei and C. brenneri. For each species, we also sequenced 
small RNAs from embryos and young adult males. In each library 
~80%-90% of small RNAs were 21-23 nt and contained either a 
5' U or 5' G, features indicative of piRNAs, miRNAs, and 22G siRNAs 
(Fig. 1A). Each of the libraries, but particularly male and embryo 
libraries, also had a small fraction of 26-nt 5' G-containing small 
RNAs (26G siRNAs) (Fig. 1A). 

In C. elegans, small RNAs can be classified by their genetic 
requirements, particularly their Argonaute binding partners, but 
for siRNAs, the RNA-directed RNA polymerase that produces them 
is also indicative of their function. To determine the potential for 
each class of C. elegans small RNAs to be present in other nema- 
todes and, thus, in our small RNA libraries, we identified all 
Argonautes and RdRPs in C. elegans, C. briggsae, C. remanei, and 
C. brenneri and examined their phylogenetic relatedness. For each 
of the 25 Argonaute proteins in C. elegans, we identified at least 
one, but often multiple, likely orthologs in each of the other three 
nematode species, suggesting that a similar repertoire of small RNA 
pathways exists in each species (Table 1; Supplemental Fig. SI). 

There are four RdRPs in C. elegans: RRF-3 functions in the 
production of 26G siRNAs (Gent et al. 2010), EGO-1 and RRF-1 have 
partially overlapping roles in 22G siRNA formation (Gu et al. 2009), 
and RRF-2, which is closely related to RRF-1, has an unknown role 
(Sijen et al. 2001). From a phylogenetic analysis of nematode RdRPs, 
we identified likely orthologs of the three RdRP subfamilies, RRF-1, 
RRF-3, and EGO-1 in each of the other three nematode species (Fig. 
1C). We also identified a fourth subfamily of RdRPs, RRF-4, in each 
of the other three nematode species but not in C. elegans (Fig. 1C). 
By RT-PCR assays from mixed-stage animals, we could readily detect 
expression of Cbr-rrf-4/CBG04006, Cre-rrf-4.1/CRE23681, and Cbn- 
rrf-4.1/CBN18731, but not Cre-rrf-4.2/CRE24272 or Cbn-rrf-4.2/ 
CBN 30970 (Fig. ID), suggesting that at least one gene in the rrf-4 
family is expressed in each species except C. elegans. 

The majority of Caenorhabditis miRNA families within each 
species are conserved 

miRNAs in C. elegans are well-characterized (Lau et al. 2001; Lee 
and Ambros 2001; Grad et al. 2003; Lim et al. 2003; Ruby et al. 



2006; Kato et al. 2009; Gerstein et al. 2010); however, miRNAs in C. 
briggsae and C. remanei have been only partially characterized (de 
Wit et al. 2009), and miRNAs in C. brenneri have not been exam- 
ined. To identify miRNAs in each of the four species, we used the 
miRDeep2 program (Mackowiak 2011; Friedlander et al. 2012). To 
conclusively call a miRNA, we required a candidate predicted by 
miRDeep2 to also be predicted by MIReNA (Mathelier and Carbone 
2010) and/or contain a seed sequence (positions 2-7) conserved 
between Caenorhabditis species. As validation of the approach, we 
identified the majority of annotated Caenorhabditis miRNAs from 
our deep sequencing libraries (Supplemental Table S2). We iden- 
tified 37 new miRNAs in C. briggsae, 48 new miRNAs in C. remanei, 
and 215 new miRNAs in C. brenneri (Supplemental Table S3) but 
did not identify any new miRNAs in C. elegans. We grouped all 
known and newly identified miRNAs into families, based on seed 
sequence. To date, there are 106 miRNA families annotated in C. 
elegans, 84 in C. briggsae, 85 in C. remanei, and 87 in C. brenneri, 
which constitute 176 distinct miRNA families (Fig. 2A,B; Supple- 
mental Table S4). Fifty-four miRNA families are conserved in all 
four species, 61 families are present in three or more species, and 
most miRNA families within each species have homologs in at least 
two other species (Fig. 2C). Few (<5%) miRNA families are con- 
served between only two or three nematode species (Fig. 2C). 
However, in each species, at least 20% of miRNA families are 
unique, suggesting that miRNAs are born at relatively high rates 
(Fig. 2C). 

Among conserved miRNA families, ~40%-60% have multi- 
ple members within a species, whereas <13% of nonconserved 
families contain multiple members, suggesting that ancient 
miRNA families expand to confer robustness in gene regulatory 
networks (Fig. 2D). A striking example of this is the miR-35 family 
which has expanded to contain at least eight members in each 
species and as many as 27 members in C. brenneri (Fig. 2A). The 
miR-35 family is one of the few miRNA families essential for de- 
velopment (Alvarez-Saavedra and Horvitz 2010). Each of the other 
families essential for development, including miR-51, miR-58, and 
let- 7, also contain multiple (>4) members in each species (Fig. 2A). 

Most miRNAs are processed from primary transcripts in 
sequential steps involving the ribonucleases Drosha and Dicer. 
However, some miRNAs are, instead, derived from short intronic 
hairpins called mirtrons during splicing, thereby bypassing Drosha 
cleavage (Okamura et al. 2007; Ruby et al. 2007). We found that out 
of the 15 C. elegans annotated mirtrons (Chung et al. 2011), only 



Table 1. Argonuates and their associated small RNAs in Caenorhabditis 



Number of genes in each class of Argonautes 



Class 



Clade 



Associated small RNAs 



C. elegans 



C. briggsae 



C. remanei 



C. brenneri 



ALG-1 


AGO 


miRNAs 


1 


1 


1 


1 


ALG-2 


AGO 


miRNAs 


1 


1 


3 


1 


HPO-24 


AGO 


Unknown 


1 


1 


1 


2 


ALG-3/4 


AGO 


Sperm 26G siRNAs 


2 


1 


1 


1 


ERGO-1 


PIWI 


Oocyte/embryo 26G siRNAs 


1 


4 


4 


9 


PRG-1/2 


PIWI 


piRNAs/21 U-RNAs 


2 


1 


4 


1 


CSR-1 


WAGO 


22G siRNAs 


1 


1 


1 


3 


RDE-1 


WAGO 


Exogenous siRNAs, miR-243 


1 


1 


2 


2 


C04F12.1 


WAGO 


22G siRNAs 


1 


1 


3 


2 


WAGO-1 -5 


WAGO 


22G siRNAs 


5 


3 


4 


7 


WAGO-6-8 


WAGO 


22G siRNAs 


4 


5 


6 


11 


NRDE/HRDE 


WAGO 


Nuclear 22G siRNAs 


5 


4 


7 


6 


Total 






25 


24 


37 


46 
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Conserved miRNAs* 



B 



Nonconserved miRNAs* 



*Numbers in columns are total family members. 

^ miRNA conservation between species 

C. briggsae C. remanei 

/ A \ 21 16 / 

C. elegans 1 2 ^ 

44 \ 2 ^ 4 / 

3 S 54 \ 0 

1 0 
1 



Family 


Seed 








<^ F ■, 

v> Family 


Seed 


\j~ 




Family 


Seed 


vr 


x 






miR-35 


CACCGG 


8 


17 


17 


27 


miR-358 


UUGGUA 


1 


0 


1 


0 


miR-7591 


AUCGGU 


0 


1 


0 


0 


let-7 


GAGGUA 


7 


9 


6 


8 


miR-788 


CCGCUU 


1 


0 


1 


0 


miR-7592 


UCGAAA 


0 


1 


0 


0 


miR-51 


ACCCGU 


6 


4 


6 


7 


miR-1 820 


UUUGAU 


1 


0 


1 


0 


miR-789 


ACGUGG 


0 


1 


0 


0 


miR-58 


GAGAUC 


6 


5 


6 


6 


miR-392 


AUCAUC 


1 


0 


0 


1 


miR-788 


AAUGGG 


0 


1 


0 


0 


miR-63 


AUGACA 


5 


5 


8 


26 


miR-7615 


GAUCUA 


0 


0 


0 


2 


miR-65 


CUACCG 


0 


1 


0 


0 


miR-1 


GGAAUG 


4 


2 


1 


2 


miR-7616 


GUAGUC 


0 


0 


0 


2 


miR-54a 


GUAUUC 


0 


1 


0 


0 


miR-44 


GACUAG 


4 


6 


4 


5 


miR-7617 


UAAGUC 


0 


0 


0 


2 


miR-38 


CCGGGU 


0 


1 


0 


0 


miR-72 


GGCAAG 


4 


5 


5 


9 


miR-7618 


AAAACA 


0 


0 


0 


1 


miR-35b 


GCCAAU 


0 


1 


0 


0 


miR-87 


UGAGCA 


4 


2 


2 


3 


miR-7619 


AUGCCG 


0 


0 


0 


1 


miR-353 


AAGUAU 


0 


1 


0 


0 


miR-2 


AUCACA 


3 


3 


3 


6 


miR-7620a 


AUUGUA 


0 


0 


0 


1 


miR-2224 


UGAGGU 


0 


1 


0 


0 


miR-238 


UUGUAC 


3 


4 


4 


7 


miR-7621a 


CAACAG 


0 


0 


0 


1 


miR-2214 


CUCGGU 


0 


1 


0 


0 


miR-50 


GAUAUG 


3 


4 


3 


3 


miR-7621b 


CAACCG 


0 


0 


0 


1 


miR-2212 


GGCAGA 


1 


1 


0 


0 


lin-4 


CCCUGA 


2 


3 


3 


3 


miR-7622 


CAAUUC 


0 


0 


0 


1 


miR-2208 


AGUGUA 


2 


0 


0 


0 


miR-1 824 


GGCAGU 


2 


1 


2 


4 


miR-789b 


CAUCCU 


0 


0 


0 


1 


miR-258 


GUUUUG 


2 


0 


0 


0 


miR-231 


AAGCUC 


2 


2 


2 


2 


miR-789a 


CCUGCC 


0 


0 


0 


1 


miR-268 


GCAAGA 


2 


0 


0 


0 


miR-251 


UAAGUA 


2 


2 


2 


3 


miR-7623 


CUCUAA 


0 


0 


0 


1 


miR-789 


AUUGAU 


2 


0 


0 


0 


miR-357 


AAUGCC 


2 


3 


1 


2 


miR-7624 


GAGGUG 


0 


0 


0 


1 


miR-1018 


GAGAGA 


1 


0 


0 


0 


miR-46 


GUCAUG 


2 


2 


2 


2 


miR-7625 


GGACCU 


0 


0 


0 


1 


miR-1 020 


UUAUUC 


1 


0 


0 


0 


miR-49 


AGCACC 


2 


2 


2 


3 


miR-7626 


GGAGGU 


0 


0 


0 


1 


miR-1 021 


AGUGAG 


1 


0 


0 


0 


miR-75 


UAAAGC 


2 


1 


2 


3 


miR-7627 


GGGAUU 


0 


0 


0 


1 


miR-1 022 


AGAUCA 


1 


0 


0 


0 


miR-790 


UUGGCA 


2 




2 


2 


miR-7628 


GUACAA 


0 


0 


0 


1 


miR-1818 


GUGGUC 


1 


0 


0 


0 


miR-86 


AAGUGA 


2 




2 


2 


miR-7629 


GUGAUG 


0 


0 


0 


1 


miR-1 821 


GAGGUC 


1 


0 


0 


0 


lsy-6 


UUUGUA 


1 


1 


1 


1 


miR-7630 


UACACU 


0 


0 


0 


1 


miR-1 822 


GAGCUG 


1 


0 


0 


0 


miR-124 


AAGGCA 


1 




2 


2 


miR-7631 


UAGAUC 


0 


0 


0 


1 


miR-1 829a 


AACCAU 


1 


0 


0 


0 


miR-228 


AUGGCA 


1 


1 


1 


3 


miR-230 


UAUUAG 


0 


0 


0 


1 


miR-1 829b 


AGCGAU 


1 


0 


0 


0 


miR-230 


CUUGGU 


1 


1 


1 


1 


miR-7632 


UGAUAU 


0 


0 


0 


1 


miR-1 829c 


AGCGAA 


1 


0 


0 


0 


miR-232 


AAAUGC 


1 






1 


miR-7633 


UGCCCA 


0 


0 


0 


1 


miR-1 830 


CUAGGA 


1 


0 


0 


0 


miR-234 


UAUUGC 


1 


1 


1 




miR-7634 


UUUCAC 


0 


0 


0 


1 


miR-1 832 


CAGCGA 


1 


0 


0 


0 


miR-235 


AUUGCA 


1 




1 




miR-7599 


GGGAUC 


0 


0 


3 


1 


miR-2209c 


AAAGAC 


1 


0 


0 


0 


miR-236 


AAUACU 


1 


1 


1 




miR-356 


UUGUUC 


0 


0 


1 


1 


miR-2210 


AAAGUC 


1 


0 


0 


0 


miR-242 


UGCGUA 


1 


1 


1 


1 


miR-7600 


AUACUU 


0 


0 


1 


1 


miR-2213 


AGCUGU 


1 


0 


0 


0 


miR-244 


CUUUGG 


1 


1 


1 


1 


miR-7602 


GAGAUU 


0 


0 


1 


0 


miR-2214 


AUUGAC 


1 


0 


0 


0 


miR-245 


UUGGUC 


1 


1 


1 




miR-7601 


UAAGUU 


0 


0 


1 


0 


miR-2216 


CACAUU 


1 


0 


0 


0 


miR-246 


UACAUG 


1 


1 






miR-58a 


GCCCUA 


0 


0 


1 


0 


miR-2217 


CGACCC 


1 


0 


0 


0 


miR-248 


UACACG 


1 


1 


1 


1 


miR-230 


AUUAGU 


0 


0 


1 


0 


miR-243 


AUCUCG 


1 


0 


0 


0 


miR-249 


CACAGG 


1 


1 


1 


1 


miR-2228 


CUAUUU 


0 


0 


1 


0 


miR-257 


AGUAUC 


1 


0 


0 


0 


miR-253 


UAGUAG 


1 


1 


1 


! 


miR-7603 


GUUGAG 


0 


0 


1 


0 


miR-260 


UGAUGU 


1 


0 


0 


0 


miR-254 


GCAAAU 


1 


1 


1 


1 


miR-7604 


GAUGUU 


0 


0 


1 


0 


miR-262 


UUUCUC 


1 


0 


0 


0 


miR-255 


AACUGA 


1 


1 


1 




miR-7605 


CUACCA 


0 


0 


1 


0 


miR-2953 


ACAGAA 


1 
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Figure 2. Conservation of miRNAs in nematodes. (A) Table of conserved miRNAs classified by family. Seed sequences are positions 2-7, relative to the 5' 
end of the miRNA. The number in each row represents the number of miRNAs in each family in each species. Shading indicates presence of at least one 
member of a family. (B) As in A, but nonconserved miRNAs. (C) Venn diagram shows the number of miRNA families and their overlap in each of the four 
nematode species. (D) The percentage of families having the indicated number of members is shown for conserved and nonconserved miRNAs in each 
species. (£) Table of mirtrons classified by the gene hosting the mirtron. 
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miR-62, embedded in the third intron of ugt-50, is conserved in the 
other three nematodes (Fig. 2E). miR-62 has 100% sequence con- 
servation in all four nematodes, and the conservation of the intron 
sequence itself is much higher than that of other ugt-50 introns 
(Supplemental Fig. S2). Although the role of miR-62 is unknown, 
the strong selective pressure to maintain it hints at an important 
function. In addition to miR-62, we identified one nonconserved 
mirtron in C. briggsae, one in C. remanei, and two present in 
C. briggsae, C. remanei, and C. brenneri but not in C. elegans (Fig. 2E). 

piRNAs share common features in their biogenesis 
and mechanism of action 

C. elegans piRNAs (21U-RNAs) are 21 nt long and contain a 5' U. 
Although the mechanism by which piRNAs are formed is unclear, 
they map downstream from conserved sequence motifs that may 
function as transcriptional promoters (Ruby et al. 2006; de Wit 
et al. 2009; Cecere et al. 2012). To identify piRNAs from C. briggsae, 
C. remanei, C. brenneri, and C. elegans, we filtered our data sets for 
small RNAs that were not identified in our miRNA analysis and 
that were 21 nt long and contained a 5' U. Consistent with pre- 
vious studies (Ruby et al. 2006; de Wit et al. 2009), we found that 
a GTTTC core motif was strongly overrepresented between -37 
to -47 nt upstream of the starting T in candidate piRNA coding 



sequences from each species. The nucleotide composition and the 
length of spacer sequence between the large and small motifs were 
nearly identical in all species, consistent with previous findings in 
C. briggsae and C. remanei (Supplemental Fig. S3; de Wit et al. 2009). 

Using a scoring matrix based on the consensus motif, spacer 
sequence length, and 5 ' U features of piRNAs, we scanned each of 
the four genomes for putative piRNA loci. The numbers of both 
predicted and observed piRNA loci were similar between C. elegans, 
with 17.6K predicted and 9.9K observed, and C. briggsae, with 
14. 8K predicted and 7.5K observed (Fig. 3A). C. remanei and 
C. brenneri possess far more piRNA loci than C. elegans and 
C. briggsae. In C. remanei, 33. 8K piRNAs were predicted and 23. 7K 
were observed, and in C. brenneri, 54. 8K piRNAs were predicted and 
29. 8K were observed (Fig. 3 A). A saturation analysis indicates that 
we captured the majority of piRNAs produced at these particular 
developmental stages (Supplemental Fig. S4). The genomes of 
C. remanei and C. brenneri are —35% larger than those of C. elegans 
and C. briggsae; however, this alone does not account for the 2-3 
times more piRNAs identified in C. remanei and C. brenneri. Inter- 
estingly, C. elegans and C. briggsae are both androdioecious (male/ 
hermaphrodite) species, whereas C. remanei and C. brenneri are 
both gonochoristic (male/female). This suggests that species that 
mate at every generation require a more complex repertoire of 
piRNAs for genome surveillance. 
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Figure 3. Functional and genomic features of piRNAs are conserved. (A) The numbers of piRNA loci predicted based on regulatory motifs and the 
numbers identified from high-throughput sequencing. (B) The percentage of protein-coding genes in each genome that could be targeted by piRNAs if 
zero, one, two, or three mismatches are allowed. (C) Density of small RNAs within a 1 00-nt window centered on the predicted target sites of the top 20% 
most abundant piRNAs. (Blue) Small RNAs that are antisense to the predicted targets; (red) those that are sense to the targets. (D) Distribution of observed 
(red) and predicted (blue) piRNA loci per 100-kb window in C. elegans (top) and C briggsae (bottom). There are two piRNA clusters on C. briggsae 
chromosome IV: the 0 to 6.9 Mb region largely in synteny with the two C. elegans piRNA clusters (highlighted in lines with arrows); and the 1 3.1 to 1 5.1 
Mb cluster. In addition, C briggsae has a third piRNA cluster on chromosome I at position 9.9 to 1 1 .3 Mb. 
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Although we identified nearly 70,000 total piRNAs in our 
deep sequencing data sets, not a single piRNA sequence was 
present in more than one species. We also assessed piRNA se- 
quence conservation when one, two, or three mismatches 
were allowed. Even with this less stringent criterion, only 0%, 
0.01%, and 0.1% of C. elegans piRNAs have potential homologs 
in C. briggsae allowing for one, two, and three mismatches, 
respectively. 

To identify potential piRNA targets, we aligned piRNA se- 
quences with all annotated protein-coding genes from each spe- 
cies. piRNA-target recognition is thought to be permissive to 
around three mismatches (Bagijn et al. 2012; Lee et al. 2012); thus, 
we did the analysis allowing for zero, one, two, or three mismatches 
(Fig. 3B). When up to three mismatches are allowed, —30% of 
C. elegans and —20% of C. briggsae genes are potential targets (Fig. 
3B). Although C. remanei and C. brenneri contain a substantially 
larger repertoire of piRNAs, the proportions of genes with po- 
tential piRNA targets are similar to that of C. elegans (Fig. 3B). 

C. elegans piRNAs can trigger the production of RdRP- 
dependent secondary siRNAs centered on and antisense to piRNA 
target sites (Bagijn et al. 2012; Lee et al. 2012). To determine if 
piRNAs trigger siRNA formation in the other three nematodes, we 
assessed both sense and antisense siRNA abundance at candidate 
piRNA target sites. When all observed piRNAs were included in 
the analysis, there was only a slight enrichment of siRNAs at 
predicted piRNA target sites (data not shown). However, when 
only the top 20% most abundant piRNAs were considered, we 
observed a substantial enrichment of siRNAs antisense, but not 
sense, to the predicted piRNA target sites in each species (Fig. 3C). 
Of these siRNAs, 70%-80% are 22G siRNAs. 

These results suggest that, although individual piRNAs are not 
conserved, the mechanism in which they are formed and their 
propensity to trigger secondary siRNA formation are conserved 
(Bagijn et al. 2012; Lee et al. 2012). 

C. elegans piRNAs are primarily derived from two broad 
clusters on chromosome IV (Fig. 3D; Ruby et al. 2006). We asked 
whether piRNA loci are also clustered in other species. We 
restricted our analysis to C. briggsae because C. remanei and 
C. brenneri DNA sequences have not yet been assembled into chro- 
mosomes. There is extensive conservation of chromosome organi- 
zation and synteny between C. elegans and C. briggsae (Stein et al. 
2003; Hillier et al. 2007). In C. briggsae, the syntenic regions of the 
two major C. elegans piRNA clusters on chromosome IV also pro- 
duced high levels of piRNAs (Fig. 3D; Ruby et al. 2006; de Wit et al. 
2009). The two regions that give rise to the C. elegans piRNA 
clusters on chromosome IV are rearranged in C. briggsae such that 
they are separated from one another by only ~1 Mb. Interestingly, 
the ~1-Mb region separating the two clusters also contains a high 
abundance of piRNA loci. Together, the region forms a continuous 
6.9-Mb piRNA cluster. We also identified a second piRNA cluster on 
chromosome IV (13.1-15.1 Mb) and another on chromosome I 
(9.9-11.3 Mb) specific to C. briggsae (Fig. 3D). The regions that give 
rise to these two C. briggsae-specific piRNA clusters lack continuous 
synteny with C. elegans, as determined by pairwise alignments. Two 
C. briggsae piRNA clusters previously identified on chromosomes 
I (7.8-9.5 Mb) and III (0-0.3 Mb) were represented by only average 
numbers of reads in our libraries and may have been artifacts of 
low sequencing depth (de Wit et al. 2009). That piRNAs in both 
C. elegans and C. briggsae tend to cluster and that conservation of 
these clusters appears to depend on long regions of continuous 
synteny suggests that the clusters are important for the birth of 
new piRNAs. 



Distinct populations of piRNAs in males and females 

In C. elegans, differences in the upstream regulatory motif drives 
male- and female-specific piRNA expression (M Freeberg and J Kim, 
pers. comm.). In each of the four Caenorhabditis species we ana- 
lyzed, we observed a highly significant positive correlation be- 
tween piRNA populations in early embryos and adult her- 
maphrodites or adult females+males (Fig. 4A). In contrast, piRNA 
levels between adult males and adult hermaphrodites or adult 
females+males are only modestly correlated and show a biphasic 
pattern of distribution indicative of two distinct classes (Fig. 4B). In 
C. elegans, we identified 9307 distinct piRNAs in hermaphrodites 
and 6065 distinct piRNAs in males. Of these, 5044 were enriched 
greater than threefold in hermaphrodites and 3336 were enriched 
greater than threefold in males. Only 1493 piRNAs had similar 
expression levels in hermaphrodites and males (Fig. 4C). Each of 
the other species also had distinct sets of piRNAs that were 
enriched in either males or hermaphrodites/females+males (Fig. 
4C), suggesting that the production of distinct classes of male and 
female piRNAs is conserved in nematodes. The upstream motifs 
and length of spacer between the large and small motifs are nearly 
identical between the two classes of piRNAs, although several po- 
sitions in the large motif and surrounding sequence show a stron- 
ger bias for a particular nucleotide in one class relative to the other 
(Supplemental Fig. S5). 

The genomic distributions of C. elegans hermaphrodite- and 
male-enriched piRNAs are similar, although hermaphrodite- 
enriched piRNAs are somewhat more biased toward the cluster on 
the right arm of chromosome IV than are male-enriched piRNAs 
(Fig. 4D; Kato et al. 2009). In contrast to C. elegans, all C. briggsae 
piRNAs enriched in hermaphrodites are clustered on chromosome 
IV 0-6.9 Mb, the region in synteny with C. elegans piRNA clusters. 
However, almost all C. briggsae male-enriched piRNAs are located 
in the chromosome IV 13.1-15.1 Mb cluster or the chromosome 
I 9.9-11.3 Mb cluster (Fig. 4D). The nonoverlapping genomic 
distributions of C. briggsae hermaphrodite- and male-enriched 
piRNAs not only suggest that they have distinct evolutionary 
trajectories but also points to the possibility that differences in 
chromosomal-scale epigenetic marks may contribute to the sex- 
biased expression of piRNAs. 

Given that the majority of piRNAs are differentially expressed, 
we asked whether the hermaphrodite/female- and male-enriched 
piRNAs have different sets of target genes. Indeed, when we allowed 
for up to three mismatches, genes containing target sites for her- 
maphrodite- or male-enriched piRNAs are largely nonoverlapping 
in C. elegans (Fig. 4E). In microarray-based studies, —5000 genes 
have been identified that are highly expressed in the germline, 
during either oogenesis or spermatogenesis (Reinke et al. 2000, 
2004). We found that genes containing target sites for hermaphro- 
dite- or male-enriched piRNAs were neither enriched nor depleted 
for germline or oogenesis genes. However, genes containing target 
sites for male-enriched piRNAs were significantly depleted of sper- 
matogenesis genes (Fig. 4E). This suggests a selective pressure on 
male-enriched piRNAs to avoid targeting genes that are important 
for spermatogenesis. It is unclear why we do not observe a similar 
depletion of hermaphrodite-enriched piRNA target sites in genes 
important for oogenesis. 

Conservation of 22G siRNA pathways 

There are two distinct classes of 22G siRNAs in C. elegans which can 
be classified by their Argonaute binding partners. WAGO class 22G 
siRNAs silence certain protein-coding genes, transposons, pseu- 
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Figure 4. Two distinct classes of piRNAs in each species. (A) Scatter plots display the levels of piRNAs in adult hermaphrodites or adult females+males 
(x-axis) and early embryos (/-axis). (B) Scatter plots display the levels of piRNAs in adult hermaphrodites or adult females+males (x-axis) and adult males 
(/-axis). (C) Venn diagrams show the number of piRNAs that are greater than threefold enriched in hermaphrodites/females+males (left) or males (right). 
piRNAs shown in the overlapping section have similar expression levels in hermaphrodites/females and males. (D) Distribution of hermaphrodite- or male- 
enriched piRNA loci per 1 00 kb in C elegans and C briggsae. (E) The bar diagram displays the percentage of C elegans genes containing hermaphrodite- or 
male-enriched piRNA target sites that are expressed in the germline, during oogenesis or during spermatogenesis. Asterisks indicate that the difference is 
statistically significant (P< 0.001, x 2 test). The Venn diagram shows the numbers of C elegans genes containing target sites for hermaphrodite- or male- 
enriched piRNAs in purple and green, respectively. 



dogenes, and cryptic loci to protect genome integrity (Gu et al. 
2009). CSR-1 class 22G siRNAs have a role in regulating chromo- 
some segregation (Claycomb et al. 2009; van Wolfswinkel et al. 
2009) and might provide a memory of self to prevent endogenous 
genes from being silenced by the piRNA surveillance pathway, al- 
though data supporting this is lacking (Lee et al. 2012; Shirayama 
et al. 2012). 

Approximately 85% of CSR-1 targets in C. elegans have a 1:1 
ortholog in at least one of the other three Caenorhabditis species we 
analyzed, whereas —60% of the total coding genes in C. elegans 
have 1:1 ortholog in at least one other species. Similar to the av- 
erage conservation of protein-coding genes, —60% of WAGO tar- 
gets have a 1 : 1 ortholog in at least one of the other species analyzed 
(Fig. 5 A). Approximately 50%-70% of C. elegans CSR-1 target genes 
also produce siRNAs in orthologs in each of the other species (Fig. 



5 A). We also observed a modest but significant (P < 10 -11 ) positive 
correlation in 22G siRNA levels among orthologous CSR-1 target 
genes between C. elegans and each of the other species (Fig. 5B). In 
contrast, targeting by WAGO class siRNAs is more poorly con- 
served. Only — 20%-30% of WAGO targets in C. elegans produce 
siRNAs in one of the other species analyzed, and there is lower 
correlation in siRNA levels among orthologous WAGO target genes 
between species (P = 0.002-0.1) (Fig. 5A,C). Because 22G siRNAs 
are biochemically indistinguishable from one another, we can- 
not rule out the possibility that in C. briggsae, C. remanei, and 
C. brenneri, a fraction of C. elegans CSR-1 targets are, instead, 
WAGO targets and vice versa. Nonetheless, these results point to 
relatively high conservation of targeting by CSR-1 class siRNAs and 
poor conservation of targeting by WAGO class siRNAs. WAGO class 
siRNAs, which are likely produced through successive rounds of 
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Figure 5. Conservation of CSR-1 and WAGO class 22G siRNA pathways. (A) The percentages of C elegans CSR-1 or WAGO target genes that contain an 
ortholog in C briggsae, C remanei, or C brenneri, as well as the percentage of orthologous genes that produce 22G siRNAs in each species, is shown. (B) 
Scatter plots display conserved CSR-1 target genes having at least one siRNA read as a function of the log 2 22G siRNA reads in C elegans versus each of the 
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- 2 was used for P-value calculation. (C) Same as B, but conserved WAGO target genes are plotted. (D) Box plot displays the levels of 22G siRNAs 
generated from conserved and nonconserved CSR-1 and WAGO target genes in C elegans. 



siRNA amplification, are more abundant than CSR-1 class siRNAs 
in C. elegans (Claycomb et al. 2009; Gu et al. 2009; Phillips et al. 
2012). We observed that WAGO target genes, on average, produce 
higher levels of 22G siRNAs than CSR-1 target genes, regardless of 
whether or not the genes are conserved. Moreover, nonconserved 
WAGO targets tend to produce more siRNAs than conserved 
WAGO targets (P < 0.0001), whereas CSR-1 targets tend to pro- 
duce similar levels of siRNAs regardless of their conservation (P = 
0.13) (Fig. 5D). 

Embryo and sperm class 26G siRNA pathways are conserved 
in Caenorhabditis species 

There are two distinct classes of 26G siRNAs in C. elegans. One class 
is enriched in oocytes and embryos and associates with ERGO-1 
(Han et al. 2009; Vasale et al. 2010; Fischer et al. 2011). The other 



class associates with ALG-3 and ALG-4 during spermatogenesis 
(Han et al. 2009; Conine et al. 2010). Both classes are thought to 
function by triggering 22G siRNA formation and subsequent si- 
lencing of target mRNAs. Because each class of 26G siRNAs is sex- 
and stage-specific, our embryo and adult male small RNA libraries 
are each enriched for one of the two classes. 

From our C. elegans embryo small RNA library, we defined a set 
of 29 26G siRNA-yielding genes, which produced at least 10 26G 
siRNA reads per million total reads (RPM), 26 of which were pre- 
viously annotated as ERGO-1 targets (Han et al. 2009; Vasale et al. 
2010; Fischer et al. 2011). In addition, we identified 121 embryo 
26G siRNA-yielding genes in C. briggsae, 71 in C. remanei, and 272 
in C. brenneri. None of the 26G siRNA sequences are conserved 
among species. Although 2%-12% of embryo 26G siRNA-yielding 
genes are conserved (Fig. 6A), in each instance only one ortholog 
yields 26G siRNAs. 
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Figure 6. Features of 26G siRNAs are conserved. (A) The proportions of conserved embryo and sperm class 26G siRNA target genes are shown for each 
species. (B) The percentages of C elegans sperm class 26G siRNA target genes that contain an ortholog in C briggsae, C. remanei, or C brenneri, as well as 
the percentage of orthologous genes that produce 26G siRNAs in each species, are shown. (C) Box plots show the level of 22G siRNAs generated from 
embryo and sperm class 26G siRNA targets in adult hermaphrodites or females+males (purple), embryos (green), and adult males (gray). (D) Distribution 
of embryo and sperm class 26G siRNAs across each C elegans or C briggsae chromosome is overlaid in one plot. Position is relative to the total length of 
each chromosome. (£) Venn diagrams show the numbers of embryo and sperm class 26G siRNA targets and their overlap in each of the four species. 



In contrast to the poor conservation of embryo class 26G 
siRNA targets, —80% (—50% 1:1 orthologs) of C. elegans sperm 
class 26G target genes have a potential ortholog in C. briggsae, 
compared to —70% (—60% 1:1 orthologs) of all coding genes. This 
suggests that sperm class 26G target genes are not particularly 
conserved or fast evolving. Despite average conservation (—70%) 
of sperm class 26G siRNA targets, we observed a disproportionally 
low tendency for more than one ortholog to produce 26G siRNAs. 
Only —38% of C. elegans genes that produce sperm class 26G 
siRNAs are conserved and also produce >1 RPM 26G siRNAs in 
C. briggsae (Fig. 6B). In C. remanei and C. brenneri, <15% of C. elegans 
genes are conserved and also produce 26G siRNAs (Fig. 6B). These 
results indicate that distinct cohorts of genes are targeted by 26G 
siRNAs during spermatogenesis and embryogenesis in each of the 
different nematode species. 

In all four nematode species, embryo class 26G siRNA target 
genes also produce 22G siRNAs (Fig. 6C). In each species, these 
secondary 22G siRNAs are enriched in embryos relative to adults 
(Fig. 6C). Similarly, sperm class 26G siRNA target genes produce 
relatively high levels of 22G siRNAs that are enriched in adult 
males relative to embryos or hermaphrodites/males+females in 
each species (Fig. 6C). Thus, 26G siRNAs trigger stage-specific 
production of 22G siRNAs across nematode species. 

Although 26G siRNA targets do not appear to be conserved, 
we nonetheless asked whether embryo class 26G siRNA chromo- 



somal distribution is conserved. In C. elegans, oocyte/embryo class 
26G siRNAs tend to be generated from gene-poor arms of chro- 
mosomes (Fig. 6D; Vasale et al. 2010; Fischer et al. 2011). In C. 
briggsae, for which the DNA sequence has been assembled on 
chromosomes, embryo class 26G siRNAs also tend to derive from 
chromosome arms (Fig. 6D). In contrast, sperm class 26G siRNAs 
are more or less evenly distributed along each chromosome in both 
C. elegans and C. briggsae (Fig. 6D). Thus, despite very little con- 
servation of 26G siRNAs among species, the genomic distribution 
is very similar. 

Embryo and sperm class 26G siRNAs are produced from 
nearly completely nonoverlapping targets in C. elegans (Fig. 6E; 
Han et al. 2009). However, in each of the other three nematodes, 
we observed substantial overlap between the embryo and sperm 
class 26G siRNA-yielding genes (Fig. 6E). Furthermore, the distri- 
butions of 26G siRNAs on the common targets were very similar in 
embryos and males within each species (data not shown). The 
common target genes are nonconserved, and in C. briggsae they are 
located on the chromosomal arms. Thus, the features of 26G siRNA 
targets that produce siRNAs in both males and embryos are more 
similar to ERGO-1 class 26G siRNA targets than to ALG-3/4 26G 
siRNA targets. It is interesting to note that ergo-1 appears to have 
undergone extensive gene duplications in C. briggsae, C. remanei, 
and C. brenneri after diverging from their common ancestor (Sup- 
plemental Fig. SI; Table 1). It is plausible that ergo-1 paralogs ac- 
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quired new expression patterns or functions that account for the 
relatively high overlap between sperm and embryo 26G siRNA 
targets in species other than C. elegans. 

Discussion 

Using a deep-sequencing comparative genomics approach, we 
identified —25 million distinct small RNAs from gravid adults and 
developing embryos of the four nematode species C. elegans, C. 
briggsae, C. remanei, and C. brenneri. Our results indicate that each of 
the major classes of small RNAs is conserved across Caenorhabditis 
species, although the degree of conservation differs tremendously 
miRNAs are by far the most highly conserved class of small RNAs 
based on sequence. CSR-1 class 22G siRNAs are typically not con- 
served at the sequence level, yet they derive from a largely over- 
lapping panel of genes in each species, indicating that the features 
that route CSR-1 targets into the siRNA pathway are conserved. 
piRNAs, WAGO class 22G siRNAs, and each of the classes of 26G 
siRNAs are not individually conserved but share many similarities 
in their biogenesis and function among the different species. 

Although piRNA sequences are not at all conserved, other 
features, including the upstream motifs (de Wit et al. 2009), ge- 
nomic clustering, and the tendency to trigger secondary 22G 
siRNA production from target genes, are conserved. C. elegans 
piRNAs are thought to play a role in genome surveillance and 
protection against nonself nucleic acids (Bagijn et al. 2012; Lee 
et al. 2012; Shirayama et al. 2012). Our results suggest that the roles 
of piRNAs are conserved across nematodes. That there is essentially 
no overlap in piRNA sequences among different nematode species 
indicates that piRNAs themselves are under very little stabilizing 
selection. The sheer number of piRNAs (likely > 15,000 in each 
species) would allow them to target most, if not all, genes if mul- 
tiple mismatches are permissible and would, therefore, not require 
sequence conservation of individual piRNAs. Indeed, rapid evo- 
lution of piRNAs would be advantageous in an arms race against 
invading nucleic acids. 

We found that gonochoristic species that mate at every gen- 
eration possess much larger numbers of piRNAs than androdioe- 
cious species that primarily self-fertilize. In Drosophila, maternally 
deposited piRNAs can silence paternal transposons (Brennecke et al. 
2008). Moreover, crosses between Drosophila strains that differ in 
the presence of a particular transposon result in sterile progeny only 
if the transposon is inherited from the paternal genome (Brennecke 
et al. 2008). Similarly, desilencing of paternal transposons was ob- 
served in interspecies Arabidopsis crosses (Josefsson et al. 2006). It is 
possible that gonochoristic nematodes require a larger repertoire of 
piRNAs than androdioecious nematodes in order to defend against 
the greater diversity of paternal transposons encountered during 
mating. Moreover, the gonochoristic nematodes also have nearly 
twice as many Argonautes (37 in C. remanei and 47 in C. brenneri) as 
the androdioecious nematodes (24 in C. briggsae and 25 in C. elegans) 
and have undergone extensive proliferation of the PIWI clade 
Argonautes ERGO and PRG (Table 1; Supplemental Fig. SI). 

Most piRNAs are differentially expressed in hermaphrodites/ 
females and males in each of the four species tested. Differences in 
the piRNA repertoire of C. elegans males and hermaphrodites may 
result from distinct upstream sequence motifs that drive sex-spe- 
cific expression (M Freeberg and J Kim, pers. comm.). The male and 
female/hermaphrodite piRNAs target a largely nonoverlapping set 
of genes. Spermatogenesis genes are depleted of male-enriched 
piRNA targets, suggesting selective pressure on male-enriched 
piRNAs to avoid targeting genes that are highly expressed during 



spermatogenesis. It will be important to distinguish the specific 
roles of male and female piRNAs. 

If the role of CSR-1 class siRNAs is to provide a memory of self 
as has been proposed (Bagijn et al. 2012; Lee et al. 2012; Shirayama 
et al. 2012), then one would expect targeting by CSR-1 to be highly 
conserved. This is, indeed, what we observed. Targeting by CSR-1 
class 22G siRNAs does appear to be highly conserved despite the 
individual siRNAs lacking conservation. In contrast, WAGO class 
22G siRNAs are thought to target aberrant and foreign RNAs, and 
targeting of annotated protein-coding genes by this class of small 
RNAs is more poorly conserved. 

Embryo class 26G siRNAs primarily target recently evolved genes 
with very little sequence conservation, and thus, it is not surprising 
that their targets are completely distinct in each species. Sperm class 
26G siRNAs target many conserved genes, yet orthologous genes of- 
ten do not produce 26G siRNAs. In spite of this, 26G siRNAs share 
many features in common among different nematode species, in- 
cluding genomic distribution, low abundance, sex specificity, and the 
propensity to trigger stage-specific 22G siRNA production. 

Through phylogenetic analysis, we identified a distinct 
subfamily of RdRPs, RRF-4, which is present in each of the Cae- 
norhabditis species we analyzed except for C. elegans. We were 
unable to link a distinct class of siRNAs to RRF-4 from analysis of 
our small RNA libraries, likely because RRF-4-dependent siRNAs 
are biochemically indistinguishable from other classes of siRNAs 
present in these libraries. It will be important to dissect the role of 
RRF-4 once the necessary genetic resources are available to do so 
in nf-4-containing species. 

The evolutionary fluidity of the different classes of small RNAs 
and their effectors in Caenorhabditis underscores their importance in 
the regulation, surveillance, and evolution of the genome. 

Methods 

Nematode strains 

Nematode strains used in this study: C. elegans N2, C. briggsae AF16, 
C. remanei PB4641, and C. brenneri PB2801. Worms were cultured 
with bacterial strain OP50 on modified nematode growth medium 
(Andersen et al. 2012) containing 1% agar and 0.7% agarose to 
prevent burrowing of C. brenneri. All strains were grown at 20°C. 

Phylogenetic analyses of Argonautes and RdRPs 

To obtain a comprehensive and nonredundant list of Argonaute 
proteins and RdRPs in C. briggsae, C. remanei, and C. brenneri, we 
did BLASTP searches in the National Center for Biotechnology 
Information (NCBI) database using the protein sequences of 
C. elegans alg-1 and rrf-1 as queries for Argonautes and RdRPs, 
respectively. We obtained Argonaute orthologs with £-values 
<1 X 10~ 10 and query coverage >35%, and RdRP orthologs with 
£-values <1 X 10~ 113 and query coverage >70%. Using different 
Argonautes or RdRPs as queries did not affect the outcome. Mul- 
tiple sequence alignments were then performed using CLUSTALW 
slow/accurate alignments with default settings (Larkin et al. 2007). 
We constructed unrooted trees with the outputs of multiple se- 
quence alignments, using the PHYLIP DRAWTREE program. 

High-throughput sequencing and data analysis 

For the construction of adult hermaphrodite/male+female, male or 
embryo small RNA libraries, animals were grown at 20°C for 72-74 h 
post-Ll synchronization and harvested as day one gravid adult 
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hermaphrodites for C. elegans and C. briggsae and mixed pop- 
ulations of adult males and females for C. remanei and C. brenneri. 
For male isolation, 150—200 day one adult males were hand- 
picked from the NGM plate. Embryos were harvested by bleach 
treatment of —15,000 gravid adults. Total RNA was isolated by 
dounce homogenization of worms in TRI Reagent, followed by 
chloroform extraction and isopropanol precipitation. Small RNA 
high-throughput sequencing libraries were prepared as described 
(Montgomery et al. 2012). Briefly, 18- to 28-nt small RNAs were 
size-selected and treated with 20 U Tobacco Acid Phosphatase 
(Epicenter) at 37°C for 2 h to digest 5' tri- and diphosphates to 
monophosphates. Small RNAs were then ligated to the 3' adapter 
using T4 RNA ligase 2 truncated (NEB) for 1 6 h at 1 6°C. 5 ' ligations 
were done with T4 RNA ligase 1 (NEB) for 16 h at 16°C. Adapter- 
ligated RNAs were then reverse-transcribed and PCR-amplified 
using Illumina's TruSeq RNA Indexing PCR primers. Small RNA 
amplicons were size-selected by gel purification and subjected to 
Illumina HiSeq sequencing. Small RNA sequences were parsed 
using a custom Python program to remove adapter sequences and 
then mapped to the corresponding nematode reference genome 
(WormBase release WS230) allowing for 0 mismatches using 
Bowtie software (Langmead et al. 2009). For sequences mapping to 
multiple genomic loci, the total number of reads was divided by 
the number of genomic loci. Small RNA reads were then normal- 
ized to the total number of millions of mapped reads (i.e., reads per 
million). 

Small RNA classification 

C. elegans small RNAs were classified as described (Zhang et al. 
201 1). Briefly, published data sets were parsed for classifying targets 
of C. elegans WAGO, CSR-1, ERGO-1, and ALG-3/4 class siRNAs. 
WAGO class was defined as the siRNAs produced from 1112 genes 
depleted of siRNAs in WAGO-1-12, rde-3, and mut-7 (Gu et al. 

2009) . CSR-1 class was defined as siRNAs produced from the 4191 
genes that yield siRNAs enriched in CSR-1 immunoprecipitates 
(Claycomb et al. 2009). ALG-3/4 (401 genes) class and ERGO-1 
(75 genes) class were defined as 26G siRNAs enriched during 
spermatogenesis and oogenesis/embryogenesis, respectively (Han 
et al. 2009; Conine et al. 2010; Vasale et al. 2010). miRNAs were 
predicted using miRDeep2 with default settings (Mackowiak 2011; 
Friedlander et al. 2012). To identify new miRNAs with high con- 
fidence, we further required a candidate miRNA predicted by 
miRDeep2 to also be predicted by MIReNA (Mathelier and Carbone 

2010) and/or possess a seed sequence conserved between Caeno- 
rhabditis species. piRNAs/21U-RNAs were predicted as described 
(Ruby et al. 2006), using a scoring matrix based on the consensus 
motif, spacer sequence length, and 5' U features of piRNAs. piRNA 
saturation analysis was performed by taking a random subset of 
sequencing reads with increasing size and calculating the per- 
centage of piRNAs identified from this sublibrary. 

Gene orthology analyses 

Orthologous genes were identified using InParanoid4, which al- 
lows both one-to-one and many-to-many orthology cases (O'Brien 
et al. 2005). For clarity, we only kept one-to-one orthologs to an- 
alyze the conservation of different classes of small RNA target 
genes. 

Statistical analyses 

For analysis of C. elegans genes containing hermaphrodite- or male- 
enriched piRNA target sites in Figure 4E, the \ 2 test (df = 1) was 
performed. For the comparison of 22G siRNA levels among 



orthologous CSR-1 or WAGO target genes between C. elegans and 
each of the other three species, Pearson's r correlation and corre- 
sponding P- values were calculated using R. P- values for comparing 
22G siRNA levels produced from C. elegans CSR-1 and WAGO tar- 
get genes were calculated using the Welch's t-test. 

Data access 

All high-throughput sequencing data have been submitted to the 
NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm. 
nih.gov/geo/) under accession number GSE41461. 
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